{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 图像的卷积\n",
    "\n",
    ":label:`sec_conv_layer`\n",
    "\n",
    "\n",
    "现在我们了解了卷积层在理论上的工作原理，\n",
    "我们准备看看它们在实践中是如何工作的。\n",
    "基于我们对卷积神经网络的研究\n",
    "作为探索图像数据结构的有效架构，\n",
    "我们坚持以图像为榜样。\n",
    "\n",
    "\n",
    "## 互关联算子\n",
    "\n",
    "回想一下，严格来说，*卷积*层\n",
    "是一个（轻微的）用词不当，因为它们表达的操作\n",
    "更准确地说是互相关。\n",
    "在卷积层中，输入数组\n",
    "和一个*相关内核*数组相结合\n",
    "通过互相关操作生成输出阵列。\n",
    "让我们暂时忽略频道，看看它是如何工作的\n",
    "具有二维数据和隐藏表示。\n",
    "In :numref:`fig_correlation`，\n",
    "输入是一个二维数组\n",
    "高度为3，宽度为3。\n",
    "我们将数组的形状标记为$3 \\times 3$ 或 ($3$, $3$).\n",
    "内核的高度和宽度都是$2$.\n",
    "请注意，在深度学习研究社区，\n",
    "*一个过滤器*，或者只是图层的*权重*。\n",
    "内核窗口的形状\n",
    "由内核的高度和宽度给出\n",
    "(这里是$2 \\times 2$)。\n",
    "\n",
    "![Two-dimensional cross-correlation operation. The shaded portions are the first output element and the input and kernel array elements used in its computation: $0\\times0+1\\times1+3\\times2+4\\times3=19$. ](https://raw.githubusercontent.com/d2l-ai/d2l-en/master/img/correlation.svg)\n",
    "\n",
    ":label:`fig_correlation`\n",
    "\n",
    "\n",
    "在二维互相关运算中，\n",
    "我们从卷积窗口开始\n",
    "在输入数组的左上角\n",
    "然后将其滑动到输入阵列上，\n",
    "从左到右，从上到下。\n",
    "当卷积窗口滑动到某个位置时，\n",
    "该窗口中包含的输入子数组\n",
    "内核数组相乘（elementwise）\n",
    "然后将得到的数组求和\n",
    "产生单个标量值。\n",
    "这个结果给出了输出数组的值\n",
    "在相应的位置。\n",
    "这里，输出数组的高度为2，宽度为2\n",
    "这四个元素是从\n",
    "二维互相关运算：\n",
    "\n",
    "$$\n",
    "0\\times0+1\\times1+3\\times2+4\\times3=19,\\\\\n",
    "1\\times0+2\\times1+4\\times2+5\\times3=25,\\\\\n",
    "3\\times0+4\\times1+6\\times2+7\\times3=37,\\\\\n",
    "4\\times0+5\\times1+7\\times2+8\\times3=43.\n",
    "$$\n",
    "\n",
    "请注意，沿每个轴，输出\n",
    "略小于输入。\n",
    "因为内核的宽度和高度都大于1，\n",
    "我们只能正确地计算互相关\n",
    "对于内核完全位于图像中的位置，\n",
    "输出大小由输入大小 $H \\times W$给出\n",
    "减去卷积内核的大小 $h \\times w$\n",
    "通过 $(H-h+1) \\times (W-w+1)$。\n",
    "这是因为我们需要足够的空间\n",
    "在图像上'移动'卷积核\n",
    "（稍后我们将看到如何保持大小不变\n",
    "通过在图像边界周围填充零\n",
    "这样就有足够的空间来移动内核）。\n",
    "接下来，我们在 `corr2d` 函数中实现这个过程，\n",
    "它接受输入数组 `X` 和内核数组 `K`\n",
    "并返回输出数组 `Y`.\n",
    "\n",
    "但首先我们将导入相关的库。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load ../utils/djl-imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "public NDArray corr2d(NDArray X, NDArray K){\n",
    "    // 计算二维互关联。\n",
    "    int h = (int) K.getShape().get(0);\n",
    "    int w = (int) K.getShape().get(1);\n",
    "\n",
    "    NDArray Y = manager.zeros(new Shape(X.getShape().get(0) - h + 1, X.getShape().get(1) - w + 1));\n",
    "\n",
    "    for(int i=0; i < Y.getShape().get(0); i++){\n",
    "        for(int j=0; j < Y.getShape().get(1); j++){\n",
    "            Y.set(new NDIndex(i + \",\" + j), X.get(i + \":\" + (i+h) + \",\" + j + \":\" + (j+w)).mul(K).sum());\n",
    "        }\n",
    "    }\n",
    "\n",
    "    return Y;\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以构造输入数组`X`和内核数组`K`\n",
    "从上图来看\n",
    "验证上述实现的输出\n",
    "二维互相关运算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "NDManager manager = NDManager.newBaseManager();\n",
    "NDArray X = manager.create(new float[]{0,1,2,3,4,5,6,7,8}, new Shape(3,3));\n",
    "NDArray K = manager.create(new float[]{0,1,2,3}, new Shape(2,2));\n",
    "System.out.println(corr2d(X, K));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 卷积层\n",
    "\n",
    "卷积层将输入和内核相互关联\n",
    "并添加标量偏差以产生输出。\n",
    "卷积层的两个参数\n",
    "是内核和标量偏差。\n",
    "当训练基于卷积层的模型时，\n",
    "我们通常会随机初始化内核，\n",
    "就像我们使用完全连接的层一样。\n",
    "\n",
    "我们现在准备实现一个二维卷积层\n",
    "基于上面定义的 `corr2d` 函数。\n",
    "在 `ConvolutionalLayer` 构造函数中，\n",
    "我们声明 `weight` 和 `bias` 为两个类参数。\n",
    "正向计算函数 `forward`\n",
    "调用`corr2d` 函数并添加偏差。\n",
    "与 $h \\times w$ 互相关一样\n",
    "我们也提到卷积层\n",
    "作为 $h \\times w$ 卷积。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "70"
    }
   },
   "outputs": [],
   "source": [
    "public class ConvolutionalLayer{\n",
    "    \n",
    "    private NDArray w;\n",
    "    private NDArray b;\n",
    "    \n",
    "    public NDArray getW(){\n",
    "        return w;\n",
    "    }\n",
    "    \n",
    "    public NDArray getB(){\n",
    "        return b;\n",
    "    }\n",
    "    \n",
    "    public ConvolutionalLayer(Shape shape){\n",
    "        NDManager manager = NDManager.newBaseManager();\n",
    "        w = manager.create(shape);\n",
    "        b = manager.randomNormal(new Shape(1));\n",
    "        w.setRequiresGradient(true);\n",
    "    }\n",
    "    \n",
    "    public NDArray forward(NDArray X){\n",
    "        return corr2d(X, w).add(b);\n",
    "    }\n",
    "    \n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 图像中的目标边缘检测\n",
    "\n",
    "让我们花一点时间来分析卷积层的简单应用：\n",
    "检测图像中物体的边缘\n",
    "通过找到像素变化的位置。\n",
    "首先，我们构造一个 $6\\times 8$ 像素的'图像'。\n",
    "中间四列为黑色（0），其余为白色（1）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "66"
    }
   },
   "outputs": [],
   "source": [
    "X = manager.ones(new Shape(6,8));\n",
    "X.set(new NDIndex(\":\" + \",\" + 2 + \":\" + 6), 0f);\n",
    "System.out.println(X);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来，我们构造一个内核`K` ，高度为$1$，宽度为$2$。\n",
    "当我们对输入进行互相关运算时，\n",
    "如果水平相邻的元素相同，\n",
    "输出为0。否则，输出为非零。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "67"
    }
   },
   "outputs": [],
   "source": [
    "K = manager.create(new float[]{1, -1}, new Shape(1,2));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们已经准备好执行互相关操作\n",
    "参数为`X`（我们的输入）和`K`（我们的内核）。\n",
    "正如你所看到的，我们从白色到黑色的边缘检测到1\n",
    "和-1表示从黑色到白色的边缘。\n",
    "所有其他输出值为$0$。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "69"
    }
   },
   "outputs": [],
   "source": [
    "NDArray Y = corr2d(X, K);\n",
    "Y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们现在可以将内核应用于转置图像。\n",
    "正如所料，它消失了。内核`K`只检测垂直边缘。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "corr2d(X.transpose(), K);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 学习内核\n",
    "\n",
    "用有限差分`[1, -1]`设计一个边缘检测器非常简洁\n",
    "如果我们知道这正是我们想要的。\n",
    "然而，当我们看到更大的内核时，\n",
    "并考虑连续的卷积层，\n",
    "可能无法具体说明\n",
    "每个过滤器应该手动执行的操作。\n",
    "\n",
    "现在让我们看看是否可以学习从`X`生成`Y`的内核\n",
    "只查看（输入、输出）对。\n",
    "我们首先构造一个卷积层\n",
    "并将其内核初始化为随机数组。\n",
    "接下来，在每次迭代中，我们将使用平方误差\n",
    "将`Y` 与卷积层的输出进行比较。\n",
    "然后我们可以计算梯度来更新权重。\n",
    "为了简单起见，在这个卷积层中，\n",
    "我们将忽略这种偏见。\n",
    " \n",
    "这一次，我们将使用DJL内置的`Block`和`Conv2d`类。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = X.reshape(1,1,6,8);\n",
    "Y = Y.reshape(1,1,6,7);\n",
    "\n",
    "Loss l2Loss = Loss.l2Loss();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "83"
    }
   },
   "outputs": [],
   "source": [
    "// 构造一个具有1个输出通道和一个\n",
    "// 形核（1，2）。为了简单起见，我们忽略了这里的偏见\n",
    "Block block = Conv2d.builder()\n",
    "                .setKernelShape(new Shape(1, 2))\n",
    "                .optBias(false)\n",
    "                .setFilters(1)\n",
    "                .build();\n",
    "\n",
    "block.setInitializer(new NormalInitializer(), Parameter.Type.WEIGHT);\n",
    "block.initialize(manager, DataType.FLOAT32, X.getShape());\n",
    "\n",
    "// 二维卷积层使用四维输入和输出\n",
    "// 输出格式为（例如，通道、高度、宽度），其中批次\n",
    "// 大小（批次中的示例数）和通道数均为1\n",
    "\n",
    "ParameterList params = block.getParameters();\n",
    "NDArray wParam = params.get(0).getValue().getArray();\n",
    "wParam.setRequiresGradient(true);\n",
    "\n",
    "NDArray lossVal = null;\n",
    "ParameterStore parameterStore = new ParameterStore(manager, false);\n",
    "\n",
    "NDArray lossVal = null;\n",
    "\n",
    "for (int i = 0; i < 10; i++) {\n",
    "\n",
    "    wParam.setRequiresGradient(true);\n",
    "\n",
    "    try (GradientCollector gc = Engine.getInstance().newGradientCollector()) {\n",
    "        NDArray yHat = block.forward(parameterStore, new NDList(X), true).singletonOrThrow();\n",
    "        NDArray l = l2Loss.evaluate(new NDList(Y), new NDList(yHat));\n",
    "        lossVal = l;\n",
    "        gc.backward(l);\n",
    "    }\n",
    "    // 更新内核\n",
    "    wParam.subi(wParam.getGradient().mul(0.40f));\n",
    "    \n",
    "    if((i+1)%2 == 0){\n",
    "        System.out.println(\"batch \" + (i+1) + \" loss: \" + lossVal.sum().getFloat());\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "请注意，经过10次迭代后，错误已降至一个较小的值。现在我们来看看我们学习的内核数组。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ParameterList params = block.getParameters();\n",
    "NDArray wParam = params.get(0).getValue().getArray();\n",
    "wParam"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "事实上，学习到的内核阵列正在接近\n",
    "到我们之前定义的内核数组`K`。\n",
    "\n",
    "## 互关联和卷积\n",
    "\n",
    "回想一下我们在信件前一节中的观察\n",
    "在互相关算子和卷积算子之间。\n",
    "上图显示了这种对应关系。\n",
    "只需将内核从左下角翻转到右上角。\n",
    "在这种情况下，总和中的索引被恢复，\n",
    "然而，同样的结果也可以得到。\n",
    "与深度学习文献中的标准术语保持一致，\n",
    "我们将继续提到互相关运算\n",
    "作为一种卷积，严格地说，它略有不同。\n",
    "\n",
    "## 总结\n",
    "\n",
    "* 二维卷积层的核心计算是二维互相关运算。在最简单的形式中，它对二维输入数据和内核执行互相关操作，然后添加偏差。\n",
    "* 我们可以设计一个内核来检测图像中的边缘。\n",
    "* 我们可以从数据中学习内核的参数。\n",
    "\n",
    "## 练习\n",
    "\n",
    "1. 构造一个带有对角边的图像`X`。\n",
    "    * 如果对其应用内核`K`，会发生什么？\n",
    "    * 如果变换`X`顺序，会发生什么？\n",
    "    * 如果变换`K`顺序，会发生什么？\n",
    "1. 当您尝试自动查找我们创建的 `Conv2d` 类的渐变时，您会看到什么样的错误消息？    \n",
    "1. 如何通过更改输入和内核数组将互相关运算表示为矩阵乘法？\n",
    "1. 手动设计一些内核。\n",
    "    * 二阶导数的核的形式是什么？\n",
    "    * 拉普拉斯算子的核心是什么？\n",
    "    * 积分的核心是什么？\n",
    "    * 获得$d$导数的内核最小大小是多少？\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Java",
   "language": "java",
   "name": "java"
  },
  "language_info": {
   "codemirror_mode": "java",
   "file_extension": ".jshell",
   "mimetype": "text/x-java-source",
   "name": "Java",
   "pygments_lexer": "java",
   "version": "14.0.2+12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
